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Abstract — this paper aims to explore the implementation of 
part of speech tagger (POS) for Arabic Language using 
neural computing. The Arabic Language is one of the most 
important languages in the world. More than 422 million 
people use the Arabic Language as the primary media for 
writing and speaking. The part of speech is one crucial stage 
for most natural languages processing. Many factors affect 
the performance of POS including the type of language, the 
corpus size, the tag-set, the computation model. The artificial 
neural network (ANN) is modern paradigms that simulate the 
human behavior to learn, test and generalize the solutions. It 
maps the non-linear function into a simple linear model. 
Several researchers implemented the POS using ANN. This 
work proves that the using of ANN in utilizing the POS is 
achieving very well results. The performance has based the 
rate of accuracy, which most of the proposed models were 
obtained high accuracy between 90% and 99%. Besides, the 
using of neural models required less number of tag-sets for 
training and testing of the model. Most of NLP applications 
required accurate and fast POS, which is offered by the 
neural model. 

Index Terms — Machine Learning , Natural Language 
Processing, Artificial Neural Network , POS, Arabic Text. 

I. INTRODUCTION 

T he great development of information technology has 
helped to develop a set of algorithms that help the 
machine to learn and perform many functions of what 
man does. On the one hand, natural languages are one of 
the important means of transmitting information and 
exchanging experiences among humans [1]. Arabic is one 
of the five most important natural languages in the world 
where a large number of Arab and Islamic people are used 
because it is the language of the Quran they believe in. 
Hence the importance of studying the Arabic language and 
developing programs and applications. The vast amount of 
information in Arabic on the Internet requires that they 
focus on finding effective solutions to address the Arabic 
language [2]. 

Artificial Neural Networks (ANN) are a set of 
techniques that rely on simulating the work of biological 
neural networks. These systems learn to perform tasks by 
simulating the work of previous examples of any system 
and then generalize the model to simulate unseen data that 
has not been tested previously [3]. It is fast and accurate 
and can work with systems that are governed by a 
mathematical model and with little data. 

ANN's work is based on a group of connected units 
called artificial neurons that stimulate neurons in the 
biological brain. The synapses in the brain are biological, 
where a signal is transferred from one artificial neuron to 


another. The artificial nerve receives and processes a signal 
and then transmits the output to additional artificial nerve 
cells associated with it. 

ANN has the ability to handle nonlinear processes and 
models and transform them into simple controllable linear 
models. Artificial neural networks have therefore been 
applied in many applications in a wide range of disciplines 
such as vehicle control, weather forecasting and process 
control, natural resource management, radar detection, 
face recognition, signal classification, sequence 
recognition, handwriting recognition, medical diagnosis, 
and more [4]. 

Part-of-speech tagging is the process of mapping a 
word in the text as matching to a particular part of speech. 
POS tagging is not just producing words and their parts of 
speech. It is a hard task because some words has more than 
one part of speech at different times or unspoken. Part-of- 
speech tagging is one of the most critical text analysis tasks 
used to classify words into their part-of-speech. There is a 
hierarchy of tasks in NLP, at the bottom is a sentence and 
word segmentation. POS tagging builds on top of that, and 
phrase chunking builds on top of POS tags. Therefore, POS 
tagging is considered one of the essential tasks for any 
NLP applications [5]. 

This paper explores the implementation of artificial 
neural networks (ANNs) in the part of speech tagger for 
the Arabic language. ANN has been used and applied 
successfully in many different applications such as text 
mining, text extraction and retrieval, pattern recognition, 
classification and generation of Arabic text, and speech 
recognition. 

II. Arabic Language Specifications 

Arabic is consider as one of the most elegant languages 
in the world today. It is one of the most spoken languages, 
ranking fifth behind English and Hindi. According to the 
latest published statistics, more than 240,000,000 people 
speak Arabic as their first language. The studies show that 
there are two versions of the Arabic language. Standard 
Arabic is the language used in the Qur'an used by religious 
scholars to teach academically. Modern Standard Arabic 
(MSA) is used and understood in the daily dealings used 
by Arabic speakers around the world. It is the primary 
language used by writers, politicians and the media and 
used in schools that teach Arabic as a foreign language. 

The Arabic alphabet consists of 28 characters as shown 
in Figure 1. However, only three of these are vowels. These 
three symbols have five different variations, which means 
that the majority of the Arabic words contain static 
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characters. Unlike most other natural languages, Arabic is 
written in the script. This Arabic alphabet is one of the 
most important written languages of the world because of 
its distinctive form. It is written from right direction to left, 
unlike Latin languages, which are written from left to right. 
However, numbers are written from left direction to right 
[ 6 ]. 
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Figure 1: Arabic letters with Romanization 

Morphology is defined as part of linguistics that deals 
with the internal structure and word formation processes. 
A morpheme is often the smallest expressive and 
meaningful unit of language so that it cannot be divided 
into smaller parts. There are two types of morphemes: 
roots and the affixes. The root is defined as the basic form 
of the word formation which has the main meaning, while 
affixes are added at the beginning, middle or end of the 
root to produce new words that give additional meaning to 
different types as shown in Figure 2. 



Figure 2: Arabic Language Morpheme Classification 

The Arabic has two types of gender: masculine and 
feminine. Also, Arabic words can be singular, bilingual or 
plural. Besides, each word can be a noun, verb or adjective. 
Vowels in Arabic are a distinctive feature that does not 
exist in Western languages. There are three short vowels 
in Arabic text, called fatHa, Damma and kasra which 
makes it more complicated for writing and understanding 
than other languages. The diacritical signs are also 
nominal, initial, or significant as shown in Figure 3. 
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Figure 3: the short vowels in Arabic text 


III.Related Works 

There are many researchers implemented that part of 
speech using different methods like Rule-based [7], 
Stochastic[8], Support Vector Machine (SVM)[9], Hidden 
Markov (HHM)[10] , Artificial Neural network 

(ANN)[11,12], and Hybrid models[13] as shown in Figure 
4. 



Figure 4: Part of speech tagger models 

Only a few works for implementing POS tagger for 
the Arabic Language based neural approaches as presented 
in Table 1. In reference [14] Yousif, J. H in 2006, utilize 
a recurrent neural network (RNN) for Arabic text, which 
obtained an Accuracy of 94.74% and MSE of 0.035697. 
Also, Jabar, H. Y. in 2006, reference [16] implements a 
multilayered perceptron (MLP) for tagging an Arabic text, 
which achieved an accuracy of 99.99% and MSE of 
0.000109. In reference [13] Mohammed, N. F. also utilizes 
a multilayered perceptron (MLP) for tagging an Arabic 
text, which achieved an accuracy of 92%. Likewise, 
Muaidi, H. in 2014, reference [15], applied a back- 
propagation neural network (BPNN) for tagging Arabic tag 
sets. They achieved an accuracy of 98.83%. Lastly, Plank, 
B. 2016, in reference [17], employed Bidirectional long 
short-term memory (biLSTM) for tagging Arabic tag sets, 
which got an accuracy of 97.22%. 

Other researchers’ implemented neural methods for 
tagging English text [20, 21, 22, 24, 26] as illustrated in 
Tbale2. They implemented different neural methods 
including multilayered perceptron (MLP), Stuttgart Neural 
Network Simulator (SNNS), Sparse Network of Linear 
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separators (SNOW), and Discrete-time recurrent neural 
networks (DTRNN), which obtained an accuracy between 
85% and 97%. Also, some for other languages like Indian 


[11, 19, 23] based MLP. Besides, Portuguese Languages 
[25] and Dutch Language [18]. 


Table 1: Related works of Part of Speech Tagger for tagging Arabic text 


Author 

Year 

Location 

Language 

Accuracy/MSE 

ANN Model 

Yousif, J. H.[14] 

2006 

Malaysia 

Arabic 

Accuracy 94.74% 

MSE: 0.0356974 

RNN 

Jabar, H. Y [16] 

2006 

Malaysia 

Arabic 

Accurate 99.99% 

MSE: 0.000109 

MLP 

Mohammed, N. F., [13] 

2012 

Malaysia 

Arabic 

Accuracy 92% 

MLP 

Muaidi, H. [15] 

2014 

Jordan 

Arabic 

Accuracy 98.83% 

BPNN 

Plank, B., [17] 

2016 

Netherlands 

Arabic 

Accuracy 97.22% 

biLSTM 


Table 2: 

Related works of Part of Speech Tagger based different Languages 


Author 

Year 

Location 

Language 

Accuracy/MSE 

ANN Model 

Schmid, H. [20] 

1994 

UK 

English 

Accuracy 96.22% 

MLP 

Marques, N.C [18] 

1996 

Portugal 

Portuguese 

Accuracy 97% 

SNNS 

Roth, D [24] 

1998 

UAS 

English 

Accuracy 97.13 

SNOW 

Perez-Ortiz, J. A. [22] 

2001 

Spain 

English 

Accuracy 92% 

DTRNN 

Ahmed [26] 

2002 

India 

English 

Accuracy 90.4% 

MLP 

Raju, S. B [21] 

2002 

India 

English 

Accuracy 85% 

MLP 

Poel, M. [25] 

2008 

Netherlands 

Dutch 

97.88 % 

MLP 

Parikh, A. [23] 

2009 

India 

Indian 

Accuracy 95.78 % 

Multi-Neuro 

Jabar H. Yousif [11] 

2011 

OMAN 

Indian 

Accuracy 82.8% 

MSE: 0.00564 

MLP 

Narayan, R [19] 

2014 

India 

Indian 

Accuracy 91.3% 

MLP 


IV. RESULT AND DISCUSSION 

This paper discusses and reviews the implementation 
of neural methods in POS tagger for different Languages 
like Arabic, English, Indian and others. Only a few works 
for implementing POS tagger for the Arabic Language 
based neural approaches. Only a few works for performing 
POS tagger for the Arabic Language based neural 
approaches. Supervised neural models utilized for Arabic 
text as MLP such in references [13, 14, 16]. Besides, some 
of them use unsupervised models such as [14, 17]. In 
overall they achieved an accuracy between 94% and 99%. 


Figure 5 illustrates the accuracy of some researchers that 
implementing POS tagging for Arabic text. 

On the other side, several researchers performed 
neural methods for tagging other languages like English 
text [20, 21, 22, 24, 26], Indian [11, 19, 23]. Besides, 
Portuguese Languages [25] and Dutch Language [18]. 
They employed various neural schemes like MLP, SNNS, 
SNOW, and DTRNN. They obtained excellent results 
with an accuracy between 85% and 97%. Figure 6 
depicted the results of POS tagger for these languages like 
English, Indian, Dutch, and Portuguese. 
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V. CONCLUSION 

The main purpose of this paper is to explore and review 
the work done in the direction of implementation of ANN 
for implementing POS for Arabic text. The artificial neural 
network has been suggested as a modem method for 
introducing new solutions to solve the problems of the 
complexities of the Arabic language. The comparative 
study showed that most researchers used two significant 
methods for implementing the automatic POS. The first 
methodology is rule-based and the second methodology is 
relying on the science of statistics and calculations in the 
implementation of most POS-tagging models. Other 
researchers use a combination of methods (hybrid) to 


utilize the POS and getting the desired results. The 
implementation of POS Tagging using ANNs and Genetic 
algorithms is a new strategy in utilizing and processing the 
Arabic text, but it had been used and employed 
successfully in several Arabic text applications such as text 
recognition and extracting and determining the roots and 
stems, and part-of-speech prediction. The current work 
proved that there is very little work concerning the use of 
neural networks and their techniques for the classification 
of Arabic text and the extraction of POS-tagging. The 
primary factor for checking the performance of the results 
is accuracy, but most researchers relied on their texts and 
used some different parts of the speech and tag-sets, which 
makes the comparison a complicated process. 
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Figure 5: Accuracy of works related Arabic text 



Figure 6: Accuracy of works related Other Languages 
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